Python for audio processing
Contents
Python for audio processing#
All code-related materials in this tutorial are based in Python. There are several relevant courses and learning materials you may refer to for learning the basics of Python for audio signal processing. We want to highlight the course in Coursera on Audio Signal Processing for Music Applications and AudioLabs-Erlangen FMP Notebooks.
In this section we provide an overview of the very basics of Python for digital processing of audio signals, hoping that it serves as a useful entry point to this tutorials for readers that do not have a lot of experience on that particular topic. If you are familiar with the said topic, this section may still be useful since relevant tools for this topic are showcased.
Discretizing audio signals#
In a real-world scenario, sound is produced by pressure waves, air vibrations that are processed by our ears and converted into what we hear. From a computational point of view, we have to convert these signals into a measurable representation that a machine is able to read and process. In other words, these signals must be digitalized or discretizing the continuous (or analog) sound into a representable sequence of values. See an example in Fig. 1.
Fig. 1 We represent the continuous signal using a sequence of points (image from sonimbus.com).#
Note
The time-step between points is given by the sampling frequency (or sampling rate), which is a variable given in Hertz (Hz), and that indicates us how many samples per seconds we are using to represent the continuous signal. Common values for music signals are 8kHz, 16kHz, 22.05kHz, 44.1kHz, and finally 48kHz.
The important thing here is: these captured discrete values can be easily loaded and used in a Python project, normally represented as an array of data points. We can use several different libraries and pipelines to load and process audio signals in a Python environment. As you may be expecting, the computational analysis of Indian Art Music builds on top of representations of the audio signals from which we extract the musicologically relevant information and therefore, it is important to understand how we handle these data.
Loading an audio signal#
We will load an audio signal to observe how it looks like in a Python development scenario. We will use Essentia, an audio processing library which is actually written in C++ but is wrapper in Python. Therefore it is fast! Let’s first install Essentia to our environment.
%pip install essentia
Let’s now import the Essentia standard module, which includes several different algorithms for multiple purposes. Check out the documentation for further detail.
import essentia
import essentia.standard as estd
# Let's also import util modules for data processing and visualisation
import os
import math
import datetime
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
We will now use the MonoLoader function, which as the name suggests, loads an audio signal in mono.
Note
As you may know, audio signals can be represented in two channels, with different information in each of them. This is typically used in modern music recording systems, to mix and pan the different sources away from the center.
audio = estd.MonoLoader(
filename=os.path.join("..", "audio", "sharanu_janakana.wav")
)() # Loading audio at sampling rate of 44100 (default)
print(np.shape(audio))
(10584000,)
By default, MonoLoader() resamples the input audio to 44.1Hz. Having this in mind, we can compute the actual duration of the piece in seconds, which is done by dividing the total number of samples in the signal by the sampling frequency (which as a matter of fact indicates how many samples we do use to represent a second of audio).
duration = math.ceil(len(audio) / 44100)
# Printing duration in h/m/s format
print(str(datetime.timedelta(seconds=duration)).split(".")[0])
0:04:00
Our variable audio is now a one-dimensional array of values, each of these values representing the points we have used to discretize the analog signal.
Important
Most signal processing libraries load the audio signals in the sampling rate originally used when the audio was converted to digital. However, we may be able to resample the audio at loading time, if needed.
audio_22050 = estd.MonoLoader(
filename=os.path.join("..", "audio", "sharanu_janakana.wav"),
sampleRate=22050
)() # Loading audio at sampling rate of 22050
duration_22050 = math.ceil(len(audio_22050) / 22050) # We use sampling rate of 22050 now
# Printing duration of audio sampled at 22050
print(str(datetime.timedelta(seconds=duration_22050)).split(".")[0])
# Length ratio for fs=44100 and 22050
print(round(len(audio) / len(audio_22050), 3))
0:04:00
2.0
As expected, the duration in seconds is the same as before. However, if we use sampling rate of 22.05kHz, we observe that the actual length in samples of the audio signal is exactly half the length of the signal sampled at 44.1Hz. That is because, we are using half the samples to represent a second of audio.
Visualising the audio signal#
Visualising (and also listening!) to our data is very important. Python can help you with that! Listening to our data within a Python environment can be achieved in many ways. If you are working with Jupyter notebooks, you can use IPython.display.
audio = estd.MonoLoader(
filename=os.path.join("..", "audio", "sharanu_janakana.wav")
)()
ipd.Audio(audio, rate=44100)